Back

Protein Science

Wiley

Preprints posted in the last 30 days, ranked by how well they match Protein Science's content profile, based on 221 papers previously published here. The average preprint has a 0.07% match score for this journal, so anything above that is already an above-average fit.

1
GEF me a break: the consequences of freezing Rho guanine-nucleotide exchange factor catalytic domains

Anderson, L. K.; Barpal, E.; Mendoza, H.; Cash, J. N.

2026-04-09 biochemistry 10.64898/2026.04.08.717323 medRxiv
Top 0.1%
22.5%
Show abstract

Purified proteins are routinely flash frozen for use in functional and structural studies, providing a convenient way to reproduce results across complex experiments. Rho guanine-nucleotide exchange factors (RhoGEFs) are no exception to this practice, yet the effects of freezing on their activity and stability remain largely uncharacterized. This gap potentially affects the characterization of these important enzymes and how results are interpreted with respect to their prospective use as therapeutic targets. Here, we tested the isolated DH/PH tandems of P-Rex1, P-Rex2, and PRG under different cryoprotectant conditions and monitored activity and thermostability over time after flash freezing. Our results show a clear divergence between the activity of fresh and frozen purified RhoGEF protein samples in as little as one week for some conditions. Specifically, the variability in data collected on frozen samples was greatly increased. Despite these differences, thermostability seems to be preserved for much longer timepoints across RhoGEFs. Moreover, despite eventual changes in both activity and thermostability with respect to freezing, there are no obvious changes in global conformation between fresh and frozen samples of the isolated P-Rex2 DH/PH tandem. From our data, there are few generalizable trends between the different RhoGEFs and no single cryoprotective agent tested was a silver bullet to preserve both activity and thermostability across RhoGEFs. Overall, our findings emphasize the unpredictable effects of freezing RhoGEFs. As such, RhoGEF freezing should be carefully characterized for each protein and critically viewed when comparing analyses between different studies.

2
Global analysis of thermal and chemical denaturation using CheMelt: Thermodynamic dissection of highly thermostable de novo designed proteins

Lampinen, V.; Burastero, O.; Guazzelli, I. P.; Vogele, F.; Pinheiro, F.; Nowak, J. S.; Garcia Alai, M. M.; Kjaergaard, M.

2026-04-09 biophysics 10.64898/2026.04.07.716910 medRxiv
Top 0.1%
21.7%
Show abstract

De novo protein design often produces thermostable proteins that denature above 100 {degrees}C, which complicates the analysis of their stability. Thermostable proteins can be unfolded by combined chemical and thermal denaturation followed by global analysis of multiple melting curves. Here, we have developed CheMelt, a new online tool for global analysis of unfolding data via an intuitive graphical user interface. We use nanoscale differential scanning fluorimetry followed by CheMelt data analysis to dissect the combined thermal and chemical denaturation of thirty-five de novo designed protein binders. Fifteen present sufficient fluorescence changes to extract thermodynamic parameters of unfolding. These de novo designed proteins have systematically lower {Delta}Cp and m-values than comparable natural proteins, which implies that they expose fewer hydrophobic residues upon unfolding. We show that a high thermostability of a designed protein does not necessarily imply a high equilibrium stability; and demonstrate the potential of CheMelt in dissecting thermodynamic properties for protein design and engineering.

3
Structural basis for saccharide binding by human RNase 2/EDN, a protein combining enzymatic and lectin properties

Kang, X.; Prats-Ejarque, G.; Boix, E.; Li, J.

2026-03-23 biochemistry 10.64898/2026.03.20.713198 medRxiv
Top 0.1%
19.0%
Show abstract

Human RNase 2 (eosinophil-derived neurotoxin, EDN) is a major eosinophil granule protein of the vertebrate-specific RNase A superfamily and is involved in antiviral response and inflammation. Identifying ligand-binding pockets in EDN is thus relevant to structure-based drug design. In our laboratory we identified by protein crystallography a conserved site at the protein surface binding to carboxylic anion molecules (malonate, tartrate and citrate). Searching for potential biomolecules rich in anion groups and considering previous report of EDN binding to glycosaminoglycans, we explored the protein binding to saccharides. Next, EDN crystals were soaked with mono- and disaccharides, and the 3D structures of ten complexes were solved by X-ray crystallography at atomic resolution. We identified protein binding pockets to glucose, fucose, mannose, sucrose, galactose, trehalose, N-acetyl-D-glucosamine, N-acetylmuramic acid, and the sialic acid N-acetylneuraminic acid. A main site for glucose, fucose, and galactose was located adjacent to the spotted carboxylic anion site. Secondarily, N-acetylneuraminic acid, N-acetylmuramic acid, sucrose, galactose, and mannose shared another protein surface region. Overall, the saccharides clustered into seven defined sites, outlining a conserved recognition pattern, which was further analysed by molecular modelling. Interestingly, within the RNase A family, we find amphibian RNases that were initially isolated as carbohydrate binding proteins and named as leczymes, combining enzymatic and lectin properties. The present data is the first systematic structural characterization of a mammalian sugar-binding RNase within the family. The results highlight unique EDN residues that mediate its sugar specific interactions, of particular interest for a better understanding of the protein physiological role. HighlightsO_LIstructure of RNase 2 in complex with mono and disaccharides at atomic resolution C_LIO_LIidentification of RNase 2 unique sugar binding sites C_LIO_LIcharacterization of a mammalian RNase A family enzyme with lectin properties C_LI Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=110 SRC="FIGDIR/small/713198v1_ufig1.gif" ALT="Figure 1"> View larger version (46K): org.highwire.dtl.DTLVardef@1d805f7org.highwire.dtl.DTLVardef@16fcc49org.highwire.dtl.DTLVardef@ccfd92org.highwire.dtl.DTLVardef@1b8f1e_HPS_FORMAT_FIGEXP M_FIG C_FIG

4
Structure of human aldehyde oxidase under tris(2-carboxyethyl)phosphine-reducing conditions

Videira, C.; Esmaeeli, M.; Leimkuhler, S.; Romao, M. J.; Mota, C.

2026-03-25 biochemistry 10.64898/2026.03.25.713928 medRxiv
Top 0.1%
12.3%
Show abstract

The importance of human aldehyde oxidase (hAOX1) has increased over the last decades due to its involvement in drug metabolism. Inhibition studies concerning hAOX1 are extensive and a common reducing agent, dithiothreitol (DTT), was recently found to inactivate the enzyme. However, in previous crystallographic studies of hAOX1, DTT was found to be essential for crystallization. To surpass this concern another reducing agent used in crystallization trials. Using tris(2-carboxyethyl)phosphine (TCEP), a sulphur-free reducing agent, it was possible to obtain well-ordered crystals from hAOX1 wild type and variant, hAOX1_6A, which diffracted beyond 2.3 [A]. Instead of the typical star-shaped crystals of hAOX1, at pH 4.7, plates are obtained in the orthorhombic space group (P22121) with two molecules in the asymmetric unit. Activity assays with the enzyme incubated with both reducing agents show that contrary to DTT, TCEP does not lead to irreversible inactivation of the enzyme. The replacement of DTT with TCEP in crystallization of hAOX1 provides a strategy to circumvent enzyme inactivation during crystallographic studies, allowing future applications of new assays, such as time-resolved crystallography.

5
IDPForge: Deep Learning of Proteins with Global and Local Regions of Disorder

De Castro, S.; Zhang, O.; Liu, Z. H.; Forman-Kay, J. D.; Head-Gordon, T.

2026-03-27 biophysics 10.64898/2026.03.25.714313 medRxiv
Top 0.1%
10.4%
Show abstract

Although machine learning has transformed protein structure prediction of folded protein ground states with remarkable accuracy, intrinsically disordered proteins and regions (IDPs/IDRs) are defined by diverse and dynamical structural ensembles that are predicted with low confidence by algorithms such as AlphaFold and RoseTTAFold. We present a new machine learning method, IDPForge (Intrinsically Disordered Protein, FOlded and disordered Region GEnerator), that exploits a transformer protein language diffusion model to create all-atom IDP ensembles and IDR disordered ensembles that maintains the folded domains. IDPForge does not require sequence-specific training, back transformations from coarse-grained representations, nor ensemble reweighting, as in general the created IDP/IDR conformational ensembles show good agreement with solution experimental data, and options for biasing with experimental restraints are provided if desired. We envision that IDPForge with these diverse capabilities will facilitate integrative and structural studies for proteins that contain intrinsic disorder, and is available as an open source resource for general use.

6
A conserved isoleucine gates the diffusion of small ligands to the active site of NiFe CO-dehydrogenase

Opdam, L.; Meneghello, M.; Guendon, C.; Chargelegue, J.; Fasano, A.; Jacq-Bailly, A.; Leger, C.; Fourmond, V.

2026-03-21 biochemistry 10.64898/2026.03.19.713016 medRxiv
Top 0.1%
10.2%
Show abstract

CO dehydrogenases (CODH) are metalloenzymes that reversibly oxidize CO to CO2, at a buried NiFe4S4 active site. The substrates, CO and CO2, need therefore to be transported through the protein matrix to reach the active site. The most likely pathway for intra-protein diffusion is the hydrophobic channel identified in the crystal structures. Here, we use site-directed mutagenesis to study the highly conserved isoleucine 563 of Thermococcus sp. AM4 CODH2. Mutations at this position change the biochemical properties (KM for CO, product inhibition constant, catalytic bias...), and increase the resistance of the enzyme to the inhibitor O2, showing that isoleucine 563 indeed lines the gas channel. The I563F mutation decreases the bimolecular rate constant of inhibition by O2 15-fold, and increases the IC50 20-fold, which is the strongest improvement in O2 resistance reported so far. We show that the size of the introduced amino acids is less important than their flexibility - along with the size of the cavity formed near the active site in the channel. We also conclude that O2 access to the active site cannot be slowed down without also affecting CO diffusion. This tradeoff will have to be considered in further attempts to use site-directed mutagenesis to make CODHs more O2 tolerant.

7
CROWN: Curated Repository Of Well-resolved Noncovalent interactions

Poelmans, R.; Van Eynde, W.; Bruncsics, B.; Bruncsics, B.; Arany, A.; Moreau, Y.; Voet, A. R.

2026-04-01 bioinformatics 10.64898/2026.03.30.714168 medRxiv
Top 0.1%
10.1%
Show abstract

AbstractThe development of machine learning models for protein-ligand interactions is fundamentally constrained by the quality and diversity of available structural data. Existing databases of protein-ligand complexes present researchers with an unsatisfying trade-off: carefully curated collections such as PDBBind and HiQBind offer high structural reliability but cover only a narrow slice of the Protein Data Bank (PDB), while large-scale resources like PLInder provide broad coverage at the expense of rigorous quality control. Here, we introduce CROWN (Curated Repository Of Well-resolved Non-covalent interactions), a machine learning-ready dataset that reconciles this tension by applying a comprehensive, fully automated preprocessing pipeline to the PLInder database. Starting from 649,915 protein-ligand interaction systems, CROWN applies a series of interleaved quality filters and processing stages addressing crystallographic resolution, ligand identity, pocket completeness, structural repair, interaction quality, and protonation at physiological pH. A distinguishing feature of the pipeline is a final constrained energy minimisation step using custom flat-bottomed restraints, which balances crystallographic evidence with relaxation of intramolecular strain. This step -- absent from existing protein-ligand datasets -- produces structurally uniform complexes by reconciling the heterogeneous refinement practices of different crystallographers and structure determination protocols, without distorting the experimentally observed binding geometry. The resulting dataset of 153,005 complexes represents a roughly four-fold increase in protein and species diversity over PDBBind and HiQBind, while maintaining rigorous structural standards. Importantly, CROWN adopts a geometry-centric design philosophy that treats the 3D arrangement of atoms at the binding interface as a self-consistent source of information, rather than relying on externally measured binding affinities that cover only a fraction of known structures and introduce well-documented biases. We anticipate that CROWN will serve as a broadly useful resource for training generative models of protein-ligand binding poses, developing scoring functions, and benchmarking interaction prediction methods.

8
Evaluating FoldX5.1 for MAVISp Stability Data Collection

Vliora, A.; Tiberti, M.; Papaleo, E.

2026-04-02 bioinformatics 10.64898/2026.03.31.715598 medRxiv
Top 0.1%
10.1%
Show abstract

MAVISp (Multi-layered Assessment of VarIants by Structure for proteins) is a structure-based framework for facilitating mechanistic interpretation of missense variants, with protein stability as one of its core analytical layers. When software tools are updated, a key consideration for database curation is whether the new version can be adopted without compromising compatibility with existing entries. This study evaluated the effect of replacing FoldX5 with FoldX5.1 on the results of the MAVISp stability workflow. We compared predicted changes in folding free energy for 539,809 shared variants across 119 proteins. We found high overall agreement with a mean Pearson correlation of 0.933 and a mean Cohen coefficient of 0.814. Most proteins showed strong concordance, whereas only three (NUPR1, TSC1, and TMEM127) showed poor agreement. The number of disagreements was higher at sites with low AlphaFold2 confidence for NUPR1 and TSC1. These outliers did not display systematic inter-version bias, as mean shifts in folding free energies between versions were minimal. Collectively, these findings support adopting FoldX5.1 for future MAVISp data collection. We will include a transition period, during which existing entries retain FoldX5 annotations until their scheduled annual update, while new or updated entries are processed with FoldX5.1. To facilitate this transition, the FoldX software version has been added as a new metadata annotation in the MAVISp database.

9
Sequence determinants of the hypomobility of intrinsically disordered proteins in SDS-PAGE

Garg, A.; Gielnik, M. B.; Kjaergaard, M.

2026-03-25 biophysics 10.64898/2026.03.24.714011 medRxiv
Top 0.1%
9.0%
Show abstract

Proteins with intrinsically disordered regions (IDRs) migrate at a higher apparent molecular weight in sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) complicating their analysis and identification. Here, we investigate the sequence determinants of the hypomobility of IDRs using a series of synthetic low complexity domains. We find that negative charge increases the apparent molecular weight, but neutral polar tracts also have abnormally slow migration. Positive charge and hydrophobic residues decrease the apparent molecular weight, although lysine residues show a biphasic effect with decreased migration at high fractional contents. Combinations of residues show that different sequence contributions to the apparent molecular weight are not additive. The results can be rationalized by the protein-decorated micelle model by considering both SDS binding and the compaction of protein SDS-complexes.

10
Effects of protein interface mutations on protein quality and affinity

de Kanter, J. K.; Smorodina, E.; Minnegalieva, A.; Arts, M.; Blaabjerg, L. M.; Frolenkova, M.; Rawat, P.; Wolfram, L.; Britze, H.; Wilke, Y.; Weissenborn, L.; Lindenburg, L.; Engelhart, E.; McGowan, K. L.; Emerson, R.; Lopez, R.; van Bemmel, J. G.; Demharter, S.; Spreafico, R.; Greiff, V.

2026-03-26 molecular biology 10.64898/2026.03.24.713863 medRxiv
Top 0.1%
8.5%
Show abstract

Accurately modeling antibody-antigen interactions requires distinguishing intrinsic binding affinity ("protein-interaction") from protein biophysical properties ("protein-quality"), including folding, stability, and expression. However, high-throughput mutational measurements commonly used to train and benchmark computational models often conflate these effects, obscuring the true determinants of molecular recognition. Here, we present an experimental and analytical framework to disentangle protein-interaction effects from protein-quality effects in single-domain antibody (VHH)-antigen binding. Using a large-scale deep mutational scanning (DMS) dataset spanning four VHH-antigen complexes, with single and double mutations in both partners, we introduce control binders to quantify protein-quality changes independently of protein-interaction. This enables decomposition of experimentally measured affinity into protein-interaction and protein-quality components at scale. Leveraging the disentangled dataset, we evaluated state-of-the-art structure- and sequence-based models for protein-quality and protein-interaction prediction and show that their performance largely reflects protein-quality rather than protein-interaction effects. Our results highlight a major confounder in current datasets and suggest that accounting for protein-quality will be essential for training next-generation affinity-prediction models. Nomenclature Antibody related termsO_LIPrimary VHH: The VHH of a VHH-antigen complex for which the paratope and the epitope weremutated. C_LIO_LIControl VHH: A second VHH that binds to the same antigen as the primary VHH but has non-overlapping epitope positions and therefore does not bind to any of the mutated antigen positions. C_LI Affinity-related termsO_LIReal Affinity: "The strength of the interaction between two [...] molecules that bind reversibly (interact)" 1. In the context of antibody-antigen binding, it quantifies interactions between active proteins (which are expressed and correctly folded 2 and are therefore functionally and biologically active (see below). It is commonly quantified by the equilibrium dissociation constant, KD. C_LIO_LIObserved affinity ({degrees}KD): The interaction strength experimentally measured between two molecules. Unlike real affinity, this value is confounded by the biophysical properties of the individual binding partners, specifically their folding, stability, and expression levels. Consequently, the observed affinity often differs from the real/intrinsic affinity if a significant fraction of the protein population is inactive 3. NOTE: Unless otherwise specified, {degrees}KD is reported in - log10 space. For example, a {degrees}KD of -9 corresponds to 10-9M or 1nM. C_LIO_LIChange in observed affinity ({Delta}{degrees}KD): The shift in the observed affinity between two proteins upon mutation, reported as the log10-transformed fold change. A value of 1 reflects a 10-fold difference, a value of 2 a 100-fold difference, etc. This aggregate change resolves into two distinct biophysical components 2, 4: O_LIProtein-interaction change: The change in the intrinsic thermodynamic affinity between the two binding partners, each in its active state (i.e., the specific change in interface Gibbs free energy because both enthalpy and entropy are considered). C_LIO_LIProtein-quality change: The change in the fraction of the mutated protein population that is biologically active - meaning it is expressed, correctly folded, and stable 2, 5. O_LIFolding: The process that guides the polypeptide chain toward its native conformation, which is a prerequisite for forming a functional binding site. C_LIO_LIStability: The thermodynamic capacity to maintain the folded structure over time and under physiological conditions. Stability (decrease in Gibbs free energy from the unfolded to the folded state) ensures the binding interface remains intact and prevents competing processes such as aggregation 6. C_LIO_LIExpression: The steady-state abundance of the protein. This is largely dependent on proper folding and stability, as cellular quality control mechanisms degrade proteins that fail to fold or remain stable at functional concentrations. C_LI C_LI C_LIO_LIChange in relative affinity ({Delta}{Delta}{degrees}KD): the difference between the {Delta}{degrees}KD of the primary VHH compared to the control VHH for a given epitope mutation. C_LI Model-related termsO_LIESM-IF1 sc: Single-chain (sc) structure-conditioned inverse folding model (ESM-IF1), using the isolated monomer structure of the mutated protein: either the VHH or the antigen 7. C_LIO_LIESM-IF1 mc: Multi-chain (mc) structure-conditioned model (ESM-IF1), using the full complex structure (both antibody and antigen) 7. C_LIO_LIStability prediction score: Score that represents the predicted change in stability based on a single mutation, normally represented as {Delta}{Delta}G. C_LI

11
Evaluating codon optimization strategies for mammalian glycoprotein production with an open-source expression vector

Yang, C.; Soni, R.; Visconti, S. E.; Abdollahi, M.; Belay, F.; Ghosh, A.; Duvall, S. W.; Walton, C. J. W.; Meijers, R.; Zhu, H.

2026-03-20 molecular biology 10.64898/2026.03.18.712111 medRxiv
Top 0.1%
8.4%
Show abstract

Efficient production of human proteins for the development of tool compounds and biologics depends on a detailed understanding of the protein expression machinery in mammalian cells. Codon optimization is widely believed to enhance protein yield, yet its impact in homologous mammalian systems remains poorly defined. Here, we systematically compare five codon usage strategies reflecting common assumptions about rare codons, RNA stability, and synthesis efficiency. We developed pTipi, an efficient open-source mammalian expression vector, and evaluated its performance in antibody production. We generated plasmids for common epitope tag antibodies such as V5, anti-biotin and anti-His for distribution by Addgene. To compare codon usage schemes, we performed a bake-off of 18 human and murine Wnt pathway glycoproteins in mammalian cells. Small-scale expression screens revealed that codon optimization did not provide a general advantage over native coding sequences, while strategies prioritizing RNA stability consistently reduced expression. Interestingly, a skewed codon scheme using the most abundant codons produced yields comparable to native sequences and occasionally enhanced protein output. To enable flexible evaluation of codon strategies, we implemented a Golden Gate-compatible pTipi platform for efficient synthetic gene incorporation. We conclude that native codons are sufficient for robust homologous mammalian expression of glycoproteins, while selective codon skewing can be beneficial for some targets.

12
Statistical signals indicate a dependence between amino acid backbone conformation and the translated synonymous codon

Rosenberg, A.; Marx, A.; Bronstein, A. M.

2026-04-06 bioinformatics 10.64898/2026.04.02.712692 medRxiv
Top 0.1%
7.2%
Show abstract

Synonymous codons encode the same amino acid but can differ in their usage and translational properties. In previous work we reported statistical differences in backbone dihedral angle distributions associated with synonymous codons in the Escherichia coli proteome. This finding has been questioned due to concerns regarding the statistical methodology used. Here we revisit the dataset using corrected statistical procedures and alternative statistical tests. Across multiple frameworks, the real dataset consistently shows an excess of small p-values relative to randomized controls, indicating detectable codon-associated differences in backbone conformation.

13
Decoupling Topology from Geometry: Detecting Large-Scale Conformational Changes via Conformational Scanning

Lin, R.; Ahnert, S. E.

2026-03-31 bioinformatics 10.64898/2026.03.28.714756 medRxiv
Top 0.2%
6.2%
Show abstract

Protein function is fundamentally driven by structural dynamics, yet the majority of structural bioinformatics treats proteins as static rigid bodies. While Molecular Dynamics (MD) simulations attempt to capture these motions, they are computationally prohibitive for exploring large-scale conformational changes, such as domain movements or allostery, which occur on timescales often inaccessible to standard simulation. However, the Protein Data Bank (PDB) contains a latent wealth of dynamic information in the form of redundant entries proteins solved in multiple distinct conformational states. Detecting these "shape-shifting" pairs remains challenging because standard structural alignment algorithms (e.g., TM-align) rely on rigid-body superposition, which fails when substantial geometric rearrangement occurs. In this study, we introduce a high-throughput method to systematically mine the PDB for proteins that share identical topology but exhibit divergent tertiary conformations. By utilizing a coarse-grained Secondary Structure Element (SSE) representation, we decouple topological connectivity from geometric rigidity, allowing for the detection of conformational homologues that share low global structural similarity despite high predicted structural similarity. We applied this "conformational scanning" across the entire RCSB database, identifying a curated dataset of proteins undergoing significant structural rearrangements. This work bridges the gap between static structural data and dynamic function, providing a critical "ground truth" dataset for benchmarking data-driven protein design and checking the plausibility of generative structure models.

14
GYDE: A collaborative drug discovery platform for AI-powered protein design and engineering

Down, T.; Warowny, M.; Walker, A.; DAscenzo, L.; Lee, D.; Zhou, Z.; Cao, S.; Bainbridge, T. W.; Nicoludis, J. M.; Harris, S. F.; Mukhyala, K.

2026-03-27 bioinformatics 10.64898/2026.03.24.714039 medRxiv
Top 0.3%
4.8%
Show abstract

As computational tools and machine learning models for protein sciences continue to advance and proliferate, bench scientists face increasing technical challenges adopting these tools for specific applications such as drug discovery. Here we present GYDE (Guide Your Design and Engineering), an open-source, versatile, and web-based collaboration platform designed to make computational analyses of proteins and antibodies easily accessible to bench scientists. GYDE enables the exploration of sequence-structure-function relationships through a tightly integrated visual interface, offering researchers a comprehensive exploration of protein functional determinants either via real assay data or computational tools. GYDEs intuitive interface facilitates seamless access to cutting-edge AI models for protein and antibody structure prediction, design, and downstream analyses. The flexible and easy addition of new tools and models is facilitated by the use of the Slivka compute API. The platform supports saved sessions that enable researchers to easily share their findings with other users, fostering a more collaborative scientific community. GYDE is freely available for protein scientists in academia and industry to build drug discovery analytics platforms customized to their needs.

15
Comparative Unfolding of the Trp-cage Miniprotein in Anionic and Cationic Surfactants

Nnyigide, O. S.; Byeon, H.; Okpete, U. E.

2026-04-09 biochemistry 10.64898/2026.04.08.717321 medRxiv
Top 0.3%
4.8%
Show abstract

The conformational dynamics of a model cationic protein in water and in the presence of anionic sodium dodecyl sulphate (SDS) and cationic cetyltrimethylamonium bromide (CTAB) surfactants at different concentrations were investigated using all-atom molecular dynamics simulations. Free-energy landscapes constructed along principal components reveal a compact, well-defined native basin at 25 {degrees}C in water, whereas elevated temperature (100 {degrees}C) induces a broadening of the conformational space and the emergence of multiple metastable states. The presence of surfactants further modulates this behavior in a concentration-dependent manner. Cluster population analysis shows that SDS promotes a highly heterogeneous ensemble characterized by reduced dominance of the native-like cluster, while CTAB partially protects the protein from thermal denaturation at higher concentrations. Radial distribution functions demonstrate strong accumulation of SDS headgroups around the protein and pronounced insertion of SDS alkyl tails into hydrophobic protein regions, indicating direct hydrophobic destabilization and micelle-assisted unfolding. In contrast, CTAB exhibits weaker headgroup association owing to electrostatic repulsion and reduced tail-hydrophobic contacts, suggesting a less disruptive interaction mechanism. At high concentration, CTAB aggregates provide a structured hydrophobic environment that stabilizes the folded state and suppresses denaturation. Together, these results provide a molecular-level picture of how surfactant chemistry and concentration govern the conformational stability of a cationic protein, highlighting the dominant role of hydrophobic interactions in surfactant-induced denaturation at high temperature. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=89 SRC="FIGDIR/small/717321v1_ufig1.gif" ALT="Figure 1"> View larger version (24K): org.highwire.dtl.DTLVardef@f68004org.highwire.dtl.DTLVardef@14e9a98org.highwire.dtl.DTLVardef@18771d3org.highwire.dtl.DTLVardef@141fc6f_HPS_FORMAT_FIGEXP M_FIG C_FIG

16
MartiniSurf: Automated Simulations of Surface-Immobilized Biomolecular Systems with Martini

Jimenez Garcia, J. C.; Lopez-Gallego, F.; Lopez, X.; De Sancho, D.

2026-03-30 biophysics 10.64898/2026.03.27.714767 medRxiv
Top 0.3%
4.4%
Show abstract

The rational design of biomolecule immobilization strategies requires molecular-level understanding of how surface properties, tethering geometry, and structural dynamics jointly influence stability and function. Recently, coarse-grained molecular dynamics simulations based on the Martini force field have emerged as an efficient framework for studying enzyme-surface interactions. However, the reproducible construction of immobilized systems with controlled orientations remains technically challenging, usually involving multiple computational tools. Here we present MartiniSurf, an open-source command-line framework for the preparation of protein and DNA systems immobilized on solid supports within the Martini paradigm. MartiniSurf integrates automated structure retrieval and cleaning, coarse graining via tools from the Martini force field software ecosystem, customizable surface generation, and biomolecule orientation based on user-defined anchoring residues, producing complete GROMACS-ready simulation systems. The framework supports both implicit restraint-based anchoring and explicit linker-mediated immobilization, including surfaces functionalized with user-defined ligands or linker-like moieties, enabling representation of mono- and multivalent attachment geometries at different modeling resolutions. Structure-based G[o]Martini potentials can be incorporated for proteins, while DNA systems are modeled using Martini 2. Optional substrate insertion, pre-coarse-grained complex handling, and automated solvation and ionization further extend system flexibility. By integrating these components into a unified workflow, MartiniSurf enables systematic and high-throughput in silico exploration of surface-tethered biomolecules and provides a robust computational platform for rational immobilization studies. TOC Graphic O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=146 SRC="FIGDIR/small/714767v1_ufig1.gif" ALT="Figure 1"> View larger version (45K): org.highwire.dtl.DTLVardef@bc1ac4org.highwire.dtl.DTLVardef@1813b43org.highwire.dtl.DTLVardef@159b19borg.highwire.dtl.DTLVardef@19b60d6_HPS_FORMAT_FIGEXP M_FIG C_FIG

17
Visualize, Explore, and Select: A protein Language Model-based Approach Enabling Navigation of Protein Sequence Space for Enzyme Discovery and Mining

Moorhoff, F.; Medina-Ortiz, D.; Kotnis, A.; Hassanin, A.; D. Davari, M.

2026-03-25 bioinformatics 10.64898/2026.03.23.712833 medRxiv
Top 0.3%
4.3%
Show abstract

The rapid expansion of protein sequence databases continues to outpace functional characterization, creating a persistent bottleneck in enzyme discovery and mining--particularly in large, heterogeneous, and sparsely annotated sequence spaces. This gap is amplified by visualization challenges and the lack of informed strategies for exploration, selection, and mining across sequence spaces. Here, we present an embedding-based workflow, implemented in a computational platform called SelectZyme, for alignment-free visualization and exploration of protein sequence space that combines protein language model (pLM) representations with dimensionality reduction and hierarchical density-based clustering. The approach links complementary visualizations of protein sequence space as low-dimensional landscapes, connectivity projections (minimum spanning trees), and dendrogram-based organization, enabling coherent interactive exploration and candidate selection across global context and local neighborhoods without relying on sequence-identity thresholds, EC numbers, conserved motifs, or predefined functional annotations. Across distinct case studies, we demonstrate that embedding-defined neighborhoods remain structurally conserved even when sequence identity falls within the twilight zone, and that coherent functional organization emerges also for modular protein segments in a fully unsupervised analysis. We also show how this workflow supports user friendly, interactive and scalable enzyme mining in a sparsely annotated, complex multi-family protein space surpassing > 100,000 sequences, enabling constrained candidate selection around experimentally validated anchors. By enabling interactive exploration across visualizations, supporting informed candidate selection, our workflow streamlines biocatalyst discovery and helps to bridge uncharacterized sequence-space to functional characterization campaigns - thus providing a broad starting point to downstream protein engineering, machine-learning-guided design cycles, and iterative experimental screening campaigns.

18
Synonymous coding significantly affects the domain swapping propensity of myoglobin

Marx, A.; Dor, S.

2026-04-06 biochemistry 10.64898/2026.04.02.716112 medRxiv
Top 0.3%
4.0%
Show abstract

Co-translational folding is a critical, yet poorly understood, aspect of protein biogenesis due to its transient, heterogeneous, and experimentally inaccessible nature. Using a myoglobin variant engineered towards increased domain swapping, we show that stable dimers formed during heterologous E. Coli expression revert to the monomeric state following denaturation - renaturation and that domain swapping propensity is significantly affected by synonymous coding. Wider implications for the role of synonymous coding in aggregation and disease are discussed.

19
ABAG-Rank: Improving Model Selection of AlphaFold Antibody-Antigen Complexes by Learning to Rank

Tadiello, M.; Ludaic, M.; Viliuga, V.; Elofsson, A.

2026-03-19 bioinformatics 10.64898/2026.03.17.712376 medRxiv
Top 0.3%
3.7%
Show abstract

MotivationAlphaFold has transformed structural biology with an unprecedented accuracy in modeling protein structures and their interactions with biomolecules, with AlphaFold3 (AF3) achieving state-of-the-art performance. However, AF3 and other methods often struggle to accurately predict the structure of protein complexes that lack strong co-evolutionary information, such as antibody-antigen (Ab-Ag) complexes. One of the fundamental issues is that AF3 often generates accurate predictions, but fails to reliably distinguish them from the much larger set of incorrect ones. ResultsTo address this, we propose ABAG-Rank, a deep neural network that provides an efficient and robust solution for model selection of Ab-Ag interactions from a pool of structural ensembles predicted with AlphaFold. Built on the permutation-invariant DeepSets architecture, ABAG-Rank can process variable-sized ensembles of structural decoys and is directly applicable to prediction settings in which the number of candidates may vary. We train a model on a redundancy-reduced set of all known antibody-antigen complexes and find that simple geometric descriptors, along with confidence scores from AlphaFold, provide rich information about interface quality without requiring intensive physics-based calculations. Our experiments demonstrate that ABAG-Rank significantly outperforms AF3 internal scoring and the ranking performance of existing deep learning baselines. ImplementationSource code can be found at: https://github.com/tadteo/ABAG-Rank

20
UBL3 UBL domain exhibits distinct helix-centered dynamic control among ubiquitin-like proteins

Matsuda, K.; Moriya, Y.; Xu, L.; Ohmagari, R.; Aramaki, S.; Zhang, C.; Baba, A.; Hirayama, S.; Kahyo, T.; Setou, M.

2026-04-08 bioinformatics 10.64898/2026.04.06.716645 medRxiv
Top 0.4%
3.7%
Show abstract

Ubiquitin-like protein 3 (UBL3) is a post-translational modifier that sorts proteins into small extracellular vesicles and regulates the trafficking of disease-associated proteins such as -synuclein. The structural and dynamic features of the UBL domain that underlie these functions, however, remain poorly understood. Here we performed in silico structural dynamics analysis of the UBL3 UBL domain using an NMR structure ensemble combined with anisotropic network modeling (ANM) and perturbation response scanning (PRS). Principal component analysis and residue-wise fluctuation analysis consistently revealed high flexibility in the C-terminal region of UBL3. Comparative ANM analysis across 20 ubiquitin-like proteins (UBLs) further showed that C-terminal flexibility is a conserved yet variable property within the UBL family. PRS analysis demonstrated that residues forming the central -helix of the {beta}-grasp fold exert greater dynamic control over collective motions than {beta}-sheet residues. Notably, UBL3 displayed the highest helix/sheet PRS effectiveness ratio among all UBLs analyzed, highlighting the prominent dynamic contribution of helix residues in this domain. Together, these results provide a structural basis for understanding UBL3-dependent protein interactions and disease-related mechanisms, and suggest that helix-centered dynamic control in the UBL domain may represent a potential target for modulating UBL3 function.